13 research outputs found
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
Discovering Object-Centric Generalized Value Functions From Pixels
Deep Reinforcement Learning has shown significant progress in extracting
useful representations from high-dimensional inputs albeit using hand-crafted
auxiliary tasks and pseudo rewards. Automatically learning such representations
in an object-centric manner geared towards control and fast adaptation remains
an open research problem. In this paper, we introduce a method that tries to
discover meaningful features from objects, translating them to temporally
coherent "question" functions and leveraging the subsequent learned general
value functions for control. We compare our approach with state-of-the-art
techniques alongside other ablations and show competitive performance in both
stationary and non-stationary settings. Finally, we also investigate the
discovered general value functions and through qualitative analysis show that
the learned representations are not only interpretable but also, centered
around objects that are invariant to changes across tasks facilitating fast
adaptation.Comment: Accepted at ICML 202
Options of Interest: Temporal Abstraction with Interest Functions
Temporal abstraction refers to the ability of an agent to use behaviours of
controllers which act for a limited, variable amount of time. The options
framework describes such behaviours as consisting of a subset of states in
which they can initiate, an internal policy and a stochastic termination
condition. However, much of the subsequent work on option discovery has ignored
the initiation set, because of difficulty in learning it from data. We provide
a generalization of initiation sets suitable for general function
approximation, by defining an interest function associated with an option. We
derive a gradient-based learning algorithm for interest functions, leading to a
new interest-option-critic architecture. We investigate how interest functions
can be leveraged to learn interpretable and reusable temporal abstractions. We
demonstrate the efficacy of the proposed approach through quantitative and
qualitative results, in both discrete and continuous environments.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-20
Learning Generalized Temporal Abstractions across Both Action and Perception
Learning temporal abstractions which are partial solutions to a task and could be reused for other similar or even more complicated tasks is intuitively an ingredient which can help agents to plan, learn and reason efficiently at multiple resolutions of perceptions and time. Just like humans acquire skills and build on top of already existing skills to solve more complicated tasks, AI agents should be able to learn and develop skills continually, hierarchically and incrementally over time. In my research, I aim to answer the following question: How should an agent efficiently represent, learn and use knowledge of the world in continual tasks? My work builds on the options framework, but provides novel extensions driven by this question. We introduce the notion of interest functions. Analogous to temporally extended actions, we propose learning temporally extended perception. The key idea is to learn temporal abstractions unifying both action and perception
Learning Options with Interest Functions
Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can help agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture
Self-Supervised Attention-Aware Reinforcement Learning
Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the existing research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively